Goto

Collaborating Authors

 cl algorithm


bd3611971089d466ab4ca96a20f7ab13-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

B.1 ApplyingViLTtoMulti-ChoiceTasks B.1.1 ApplyingViLTtoVCR The VCR task provides object boxes, with each box corresponding to a grounded entity in the question. We use consistent mappings between the box colors and object names; for example, the[person1]object isalwaysreferenced withagreenboxintheimage, andthename Caseyinthetext. During training and inference, each possible answerai is paired with the questionq, to form a sequence"[CLS] q [SEP]ai". Forvision-only tasks, wefound thatsimply using "This is an image." We also conduct ablation studies that include twobaselines: (1) not inputting anyimage toViLTat all, and (2) inputting the zero-vector image instead of the average image of the COCO dataset.


CLiMB: AContinualLearningBenchmark forVision-and-Language Tasks

Neural Information Processing Systems

This assumption means learning separate models for language-only, vision-only, and vision-language tasks, as opposed to a single "generalist" model that can handle all modalities or subsets of them [Reed et al., 2022]. Yet, existing work suggests that knowledge grounded in multiple modalities can benefit unimodal tasks [Desai and Johnson, 2021, Jin et al., 2022].


CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Neural Information Processing Systems

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating catastrophic forgetting, but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge transfer. We envision that CLiMB will facilitate research on a new class of CL algorithms for this challenging multimodal setting.


WikiDBGraph: A Data Management Benchmark Suite for Collaborative Learning over Database Silos

Wu, Zhaomin, Wang, Ziyang, He, Bingsheng

arXiv.org Artificial Intelligence

Relational databases are often fragmented across organizations, creating data silos that hinder distributed data management and mining. Collaborative learning (CL) -- techniques that enable multiple parties to train models jointly without sharing raw data -- offers a principled approach to this challenge. However, existing CL frameworks (e.g., federated and split learning) remain limited in real-world deployments. Current CL benchmarks and algorithms primarily target the learning step under assumptions of isolated, aligned, and joinable databases, and they typically neglect the end-to-end data management pipeline, especially preprocessing steps such as table joins and data alignment. In contrast, our analysis of the real-world corpus WikiDBs shows that databases are interconnected, unaligned, and sometimes unjoinable, exposing a significant gap between CL algorithm design and practical deployment. To close this evaluation gap, we build WikiDBGraph, a large-scale dataset constructed from 100{,}000 real-world relational databases linked by 17 million weighted edges. Each node (database) and edge (relationship) is annotated with 13 and 12 properties, respectively, capturing a hybrid of instance- and feature-level overlap across databases. Experiments on WikiDBGraph demonstrate both the effectiveness and limitations of existing CL methods under realistic conditions, highlighting previously overlooked gaps in managing real-world data silos and pointing to concrete directions for practical deployment of collaborative learning systems.


A Task Details

Neural Information Processing Systems

ViL T for each task, and details about how low-shot versions of each task are sampled. B.1 Applying ViL T to Multi-Choice T asks B.1.1 Applying ViL T to VCR We follow previous work [Zellers et al., 2021, Hessel et al., 2022] and draw colored boxes directly The grounded text references, e.g. We follow the original implementations [Zellers et al., 2019b, Bisk et al., 2020] to model these tasks, B.2 Applying ViL T to Unimodal T asks We conduct low-shot experiments to test the model's transferability to unimodal However, different sub-samples the training set may lead to different results. For vision-only tasks, we found that simply using "This is an image." We also conduct ablation studies that include two baselines: (1) not inputting any image to ViL T at all, and (2) inputting the zero-vector image instead of the average image of the COCO dataset.



Efficient Continual Learning in Keyword Spotting using Binary Neural Networks

Vu, Quynh Nguyen-Phuong, Martinez-Rau, Luciano Sebastian, Zhang, Yuxuan, Tran, Nho-Duc, Oelmann, Bengt, Magno, Michele, Bader, Sebastian

arXiv.org Artificial Intelligence

Keyword spotting (KWS) is an essential function that enables interaction with ubiquitous smart devices. However, in resource-limited devices, KWS models are often static and can thus not adapt to new scenarios, such as added keywords. To overcome this problem, we propose a Continual Learning (CL) approach for KWS built on Binary Neural Networks (BNNs). The framework leverages the reduced computation and memory requirements of BNNs while incorporating techniques that enable the seamless integration of new keywords over time. This study evaluates seven CL techniques on a 16-class use case, reporting an accuracy exceeding 95% for a single additional keyword and up to 86% for four additional classes. Sensitivity to the amount of training samples in the CL phase, and differences in computational complexities are being evaluated. These evaluations demonstrate that batch-based algorithms are more sensitive to the CL dataset size, and that differences between the computational complexities are insignificant. These findings highlight the potential of developing an effective and computationally efficient technique for continuously integrating new keywords in KWS applications that is compatible with resource-constrained devices.


Sequence Transferability and Task Order Selection in Continual Learning

Nguyen, Thinh, Nguyen, Cuong N., Pham, Quang, Nguyen, Binh T., Ramasamy, Savitha, Li, Xiaoli, Nguyen, Cuong V.

arXiv.org Artificial Intelligence

In continual learning, understanding the properties of task sequences and their relationships to model performance is important for developing advanced algorithms with better accuracy. However, efforts in this direction remain underdeveloped despite encouraging progress in methodology development. In this work, we investigate the impacts of sequence transferability on continual learning and propose two novel measures that capture the total transferability of a task sequence, either in the forward or backward direction. Based on the empirical properties of these measures, we then develop a new method for the task order selection problem in continual learning. Our method can be shown to offer a better performance than the conventional strategy of random task selection.


CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Neural Information Processing Systems

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge transfer.


Persistent Backdoor Attacks in Continual Learning

Guo, Zhen, Kumar, Abhinav, Tourani, Reza

arXiv.org Artificial Intelligence

Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been studied in various contexts, little attention has been given to their practicality and persistence in continual learning, particularly in understanding how the continual updates to model parameters, as new data distributions are learned and integrated, impact the effectiveness of these attacks over time. To address this gap, we introduce two persistent backdoor attacks-Blind Task Backdoor and Latent Task Backdoor-each leveraging minimal adversarial influence. Our blind task backdoor subtly alters the loss computation without direct control over the training process, while the latent task backdoor influences only a single task's training, with all other tasks trained benignly. We evaluate these attacks under various configurations, demonstrating their efficacy with static, dynamic, physical, and semantic triggers. Our results show that both attacks consistently achieve high success rates across different continual learning algorithms, while effectively evading state-of-the-art defenses, such as SentiNet and I-BAU.